VoC theory (Varieties of Capitalism) literature has been a classic theory through out these decades and it mainly includes two prominent models which are Liberal Market Economies (LMEs) and Coordinated Market Economies (CMEs). This theory has been developing significantly and helping securing a firm root in the field of political economy. LME which exemplified by US, UK, the characteristic of this model is Meanwhile, the CME represented by Sweden and Denmark. The difference between the LME and CME is that the profits of a company mainly comes from the supply and demands in LME, while, the companies in the CME are basically shape the form a core competitiveness through non market relationships.
On the country scale of United States, it would be be classified as Liberal Market Economies while, while on the state scale, it likely to be a CME model.
According to the VoC theory, the United States is defined as liberal market economies and one characteristic of this market type is that the union density is proportional to the vocational training level. While, the previous presented model contains 9 variables from dataset and defined as a linear model shows as below:
The summary of the model shows the coefficient, p-value, and other information.
##
## Call:
## lm(formula = Highschool ~ Union_density + GDP + Urbanity + Population +
## Export + Unemployment_rate + Cost_of_living + Gini, data = data3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -261208 -64360 -25121 37190 514055
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.754e+05 5.014e+05 0.948 0.34852
## Union_density -1.692e+04 5.694e+03 -2.971 0.00489 **
## GDP 6.629e-02 1.358e+00 0.049 0.96129
## Urbanity -1.045e+03 1.826e+03 -0.572 0.57028
## Population 2.804e-02 3.834e-03 7.313 5.2e-09 ***
## Export -5.176e-01 5.463e-01 -0.947 0.34882
## Unemployment_rate 1.909e+04 2.453e+04 0.778 0.44086
## Cost_of_living 1.241e+03 1.669e+03 0.743 0.46136
## Gini -9.598e+05 1.142e+06 -0.840 0.40546
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 144200 on 42 degrees of freedom
## Multiple R-squared: 0.67, Adjusted R-squared: 0.6072
## F-statistic: 10.66 on 8 and 42 DF, p-value: 5.037e-08
According to summary of the linear model, the coefficient of the high school is negative which means if union density arises, the high school would correspondingly decrease. This phenomenon apparently conflicts with properties of LME in VoC theory. Accordingly, a potential explaination comes that racial diversity in each state might leads to this negative relationship between vocational training and union density.
Market-baised management: Market-Based Management (MBM) enables organizations to succeed in the long term by applying the principles that allow free societies to prosper. Just as upholding values such as free speech, property rights, and progress is important to a healthy, growing society, it is also pivotal in fostering a healthy, growing organization.
Nation-level/state level: US is called LME because its national governing framework does not has institutions that aggregate preference for public goods, such as high-skill development.
Descriptions:
Dataset details: -51 observations(51 states in US) and 31 variables. -Dependent variables included in the model: high school (represent vocational level). -Independent variables included in the model: union density, racial(white/black/hispanic), GDP, Urbanity, Population, Export, Cost of Living, Unemployment Rate, Gini.
According to the research question, the following things needs to be done: Data check, Conducting the initial EDA(exploratory data analysis), model check.
- To better measure the racial diversity, a new variable based on White/Black/Hispanic is created.
- In the initial EDA part, scatterplots and bubble plot are presented to show the rough corelationship between union density and vocational training to get the general idea of the question.
- Make transformation for linear regression model to see if the result turns to be more reasonable.
- Drawing the component residual plot(CR plot) to check every correlationship between independent variables and response variables.
- Partial corelation test has been used to check if there still exists significant negative relationship between two variables when control the other 7 confounding variables.
- Mediation analysis has been conducted to check if racial diverisity could be regarded as a intermediate influencing factor of union density and vocational training.
Concerns: -Sources: Bureau of Labor Statistics Wikipedia U.S. Census Bureau Census Bureau’s March Current Population Survey -Years: From 2016 to 2018 After looking through the source some issues were found that data is collected from several different source and years, so there comes worries that this data is so effective enough because these variables might have no consistency. It is more helpful to do the data analyze if data is more consistent.
-Variables: Checking 9 variables included in the model, the variable high school stands for the vocational training level. Cause on the state scale, US government needs more fundings to satisfy their supply and demand, so the high school is representitive but it’s better to include more sites such as high technology company, some industries which could contain vocational training people to satisfy their supply and demand as well. It will make more sense to add other potential sites in this variable. Additionally, the variables white/black/hispanic could not show the racial diversity directly,therefore, a new variables based on these three variables is created and will be discussed later in this report.
Part II data’s new variables description
| Variable Name | Interpretation | Time Span |
|---|---|---|
| Urbanity | Quality of being urban in each state | 2010 |
| Unemployment | Unemployment rate in each state | 2009-2017 |
| GDP_Per_Capital | GDP/Population number | 2005-2017 |
| Union | Union degree for each state | 2005-2017 |
| Voe | Quantity of vocational training people | 2007-2017 |
| Racial(black/hispanic/white) | Racial proportion for each race | 2005-2017 |
| Totalpop | Total population | 2009-2017 |
| Studentpop | Student population | 2009-2017 |
Concerns:
- Variables’ time spans are not consistent, so we concentrate on 2009-2017 period.
- Cause the dataset has only 2010 year’s urbanity, so use 2010’s urbanity represent 2009-2017’s urbanity.
Starting with union density and high school and accordingly reach a scatter plot after cleaning some outliers. While from this plot shows there is not an obvious negative relationship between these two variables.
Moreover, a bubble plot is presented. Same as former plot, setting union density as independent variable and high school as dependent variable, but this time replacing every spot with a single circle, each circle represents for one state and the radius of circle stands for racial diversity level, which means, the higher racial diversity level, the larger circle is. While, by observing this plot we could not confirm these two variables has negative relationship either.
To better understand this question, modeling research is conducted and next session will describe this part detailedly.
Since the initial EDA does not show a distinctive pattern of the negative relationship between vocational training and unionization, a replication of the original model that is used in the second-year paper has been implemented. With
##
## Call:
## lm(formula = Highschool ~ Union_density + GDP + Urbanity + Population +
## Export + Unemployment_rate + Cost_of_living + Gini, data = data3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -261208 -64360 -25121 37190 514055
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.754e+05 5.014e+05 0.948 0.34852
## Union_density -1.692e+04 5.694e+03 -2.971 0.00489 **
## GDP 6.629e-02 1.358e+00 0.049 0.96129
## Urbanity -1.045e+03 1.826e+03 -0.572 0.57028
## Population 2.804e-02 3.834e-03 7.313 5.2e-09 ***
## Export -5.176e-01 5.463e-01 -0.947 0.34882
## Unemployment_rate 1.909e+04 2.453e+04 0.778 0.44086
## Cost_of_living 1.241e+03 1.669e+03 0.743 0.46136
## Gini -9.598e+05 1.142e+06 -0.840 0.40546
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 144200 on 42 degrees of freedom
## Multiple R-squared: 0.67, Adjusted R-squared: 0.6072
## F-statistic: 10.66 on 8 and 42 DF, p-value: 5.037e-08
Since the initial EDA does not show a distinctive pattern of the negative relationship between vocational training and unionization, a replication of the original model that is used in the second-year paper has been implemented. With
##
## Call:
## lm(formula = Highschool ~ Union_density + GDP + Urbanity + Population +
## Export + Unemployment_rate + Cost_of_living + Gini, data = data3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -261208 -64360 -25121 37190 514055
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.754e+05 5.014e+05 0.948 0.34852
## Union_density -1.692e+04 5.694e+03 -2.971 0.00489 **
## GDP 6.629e-02 1.358e+00 0.049 0.96129
## Urbanity -1.045e+03 1.826e+03 -0.572 0.57028
## Population 2.804e-02 3.834e-03 7.313 5.2e-09 ***
## Export -5.176e-01 5.463e-01 -0.947 0.34882
## Unemployment_rate 1.909e+04 2.453e+04 0.778 0.44086
## Cost_of_living 1.241e+03 1.669e+03 0.743 0.46136
## Gini -9.598e+05 1.142e+06 -0.840 0.40546
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 144200 on 42 degrees of freedom
## Multiple R-squared: 0.67, Adjusted R-squared: 0.6072
## F-statistic: 10.66 on 8 and 42 DF, p-value: 5.037e-08
Then using the partial correlation test for checking if the negative relationship between Union density and Highschool exists in the primary model. The partial correlation is a measure of the strength and direction of a linear relationship between two continuous variables while controlling for the effect of other continuous variables.
By controlling the seven confounding variables in the primary model, we used two methods on this test. One is the Pearson method, which evaluates the linear relationship between two continuous variables. A relationship is linear when a change in one variable is associated with a proportional change in the other variable. Another is the Spearman method, which evaluates the monotonic relationship between two continuous or ordinal variables. In a monotonic relationship, the variables tend to change together, but not necessarily at a constant rate, and both variables change together does not mean a change in one variable change causes another variable change.
Pearson Method
## [1] -0.4167175
## [1] 0.002351165
Spearman Method
## [1] -2.7575
## [1] 0.008161668
From the output of two methods, those correlations are negative, and P-values are small, which indicates that the negative coefficient of Union density is statistical significance in the primary model. From the plot on residual of Union density and residual of Highschool, the red line has an apparent downward trend. However, by looking at the points on the graph, there is a point in the lower right corner.
##
## Call:
## lm(formula = log(Highschool) ~ Union_density + log(GDP) + Urbanity +
## log(Population) + log(Export) + Unemployment_rate + log(Cost_of_living) +
## Gini, data = data3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.8035 -0.3491 0.1601 0.3786 1.1588
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.198356 6.701335 0.776 0.442
## Union_density -0.043265 0.027298 -1.585 0.120
## log(GDP) -0.438462 0.545691 -0.803 0.426
## Urbanity 0.007762 0.009767 0.795 0.431
## log(Population) 0.950302 0.146685 6.479 8.14e-08 ***
## log(Export) -0.005278 0.092754 -0.057 0.955
## Unemployment_rate 0.018547 0.112717 0.165 0.870
## log(Cost_of_living) -0.632347 0.983668 -0.643 0.524
## Gini -1.585547 5.110432 -0.310 0.758
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6602 on 42 degrees of freedom
## Multiple R-squared: 0.7649, Adjusted R-squared: 0.7202
## F-statistic: 17.09 on 8 and 42 DF, p-value: 5.888e-11
Because the original model does not in line with the assumption, data transformation is used to adjust the model. To use the linear regression model, the most significant assumption is that variables should be in the linear relationship with the outcome. Component residual plot is used to measure if every variable satisfies linearity assumption. The red line shows the pattern of the datapoints included in every variable, and the dashed line indicates the best of the fit. If the red line is close to the dashed line and does not show curve pattern, meaning that the linearity assumption is satisfied for that variable. Based on the initial component residual plot of the original model, leverage points exist in datapoints from different variables including “GDP”, “population”, “Export”, “unemployment rate” and “cost of living”. There are issues with leverage points since it would directly change the pattern of the residual plots. For example, looking at the residual plots of “Export”, if the leverage points move upwards or downwards, the red line and the dashed line would move accordingly. The movement is not caused by the movement of the most datapoints, rather caused by the leverage points, which is not the ideal case where the red line should indicate the pattern of the majority of datapoints. To avoid the leverage points, log transformation is used to build the first model, “fit1”. In model “fit1”, log transformation has been implemented for the outcome, “highschool” and variables that have leverage points, “GDP”, “population”, “Export”, “unemployment rate” and “cost of living”. After summarizing the model “fit1”, the coefficient between the outcome “highschool” and the variable “union density” is -0.04, not significant. Because the dataset is a sample from the population, coefficient for each sample would be different because of the sampling bias arising from picking up different samples within the population. The insignificant coefficient indicates that the negative relationship between vocational training and union density is not conclusive.
After modeling, residual plots are generated to check the validation of the model. Looking at the residual plot, nearly all the residuals are equally distributed within the horizontal line valued at zero, indicating that the residuals are random, and the linearity assumption is satisfied. Looking at the Q-Q plot, majority of the residual points are aligned with the straight line, indicating the normality assumption is nearly satisfied. Looking at the scale-location plot, most residues are spread equally along the ranges of predictors, indicating the equal variance assumption is satisfied. Based on the residual plot, the model “fit1” is more valid compared to the original model. Check the component and residual plot again, the issue regarding the leverage points have been improved a lot. But two issues still have to be considered. First, there is a leverage point in GDP after taking log transformation. Second, the variable of union density slightly shows a curved pattern, indicating that linear assumption is not well satisfied, and a more robust model might be considered.
##
## Call:
## lm(formula = log(Highschool) ~ Union_density + log(GDP) + Urbanity +
## log(Population) + log(Export) + Unemployment_rate + log(Cost_of_living) +
## Gini, data = data4)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.77486 -0.31688 0.03988 0.41357 1.16933
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.757715 8.934208 -0.309 0.7591
## Union_density -0.059829 0.029775 -2.009 0.0511 .
## log(GDP) 0.119772 0.684302 0.175 0.8619
## Urbanity 0.007271 0.009686 0.751 0.4571
## log(Population) 0.889431 0.152379 5.837 7.38e-07 ***
## log(Export) -0.008890 0.091954 -0.097 0.9234
## Unemployment_rate 0.052305 0.114538 0.457 0.6503
## log(Cost_of_living) -0.397213 0.990627 -0.401 0.6905
## Gini 2.270601 5.834118 0.389 0.6991
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6542 on 41 degrees of freedom
## Multiple R-squared: 0.7502, Adjusted R-squared: 0.7014
## F-statistic: 15.39 on 8 and 41 DF, p-value: 3.764e-10
To deal with the leverage point in log GDP, one of the solutions is to leave it out. The leverage point is Washington DC with highest log GDP. Another model (fit2) with exactly same variables and outcome is created with a new dataset excluding datapoint of Washington DC. After summarizing the model “fit2”, the coefficient of union density changes 20%, indicating the difference is not ignorable.Thus, the leverage point of Washington DC cannot be simply leaved out.
By checking the partial correlation of transformed model, the red line is flatter than in the original model.
Pearson Method
## [1] -0.2375583
## [1] 0.09323562
Spearman Method
## [1] -0.2179186
## [1] 0.1244902
By checking the output from two methods, the partial correlations between Union density and Vocational training are smaller, and P-values are larger. Thus, in the transformed model, the negative relationship between the two factors is not as strong as in the original model.
The final model is adding racial diversity to the transformed model since the main question is if racial diversity can explain the negative effect between unionization and vocational training.
The data frame only contains the proportion of Black, White, and Hispanic. First of all, by using the formula from the article “How to Measure Density When You Must,” calculating the proportion of same race which equals to the sum square Black, square White adds, and square Hispanic, then the proportion of different race equals to one minus the proportion of same race. However, in the real world, different states may have different types of races, and different types of the race will affect the result of racial diversity. To alleviate this kind of problem that can arise when modeling, racial diversity needs to be normalized before adding to the model. Times the radical diversity by C/(C-1) can get the normalized racial diversity, C means the total type of race. In this case, C is 3.
\[Racial Diversity = 1 - (Black^2 + White^2 + Hispanic^2)\] \[NormRacialDiversity = \frac{C}{C-1}*Racial Diversity\]
##
## Call:
## lm(formula = log(Highschool) ~ Union_density + log(GDP) + Urbanity +
## log(Population) + log(Export) + Unemployment_rate + log(Cost_of_living) +
## Gini + norm_Racial_diversity, data = data3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.11000 -0.29731 0.02178 0.33166 1.05237
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.446738 6.396034 1.164 0.2510
## Union_density -0.024490 0.026891 -0.911 0.3678
## log(GDP) -0.053305 0.538701 -0.099 0.9217
## Urbanity -0.006738 0.010946 -0.616 0.5416
## log(Population) 0.798928 0.151591 5.270 4.68e-06 ***
## log(Export) 0.097660 0.097082 1.006 0.3203
## Unemployment_rate -0.130898 0.122559 -1.068 0.2918
## log(Cost_of_living) -1.709210 1.026990 -1.664 0.1037
## Gini -1.011016 4.833292 -0.209 0.8353
## norm_Racial_diversity 1.590066 0.645692 2.463 0.0181 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6236 on 41 degrees of freedom
## Multiple R-squared: 0.7952, Adjusted R-squared: 0.7503
## F-statistic: 17.69 on 9 and 41 DF, p-value: 1.711e-11
The assumptions of adding the normalized racial diversity model are similar to the transformed model; the only change is that adjusted R-squared increases a little bit, from 0.7202 to 0.7503. The adjusted R-squared is a modified version of R-squared and increases only if the new term improves model more than would be expected by chance. It decreases when a predictor improves the model by less than expected by chance. Therefore, after adding normalized racial diversity, the new model fits better than the transformed model.
By checking the partial correlation of the third model, the red line is pretty close to the horizontal line.
Pearson Method
## [1] -0.1242534
## [1] 0.3849993
Spearman Method
## [1] -0.1408109
## [1] 0.09939286
Then, from the output of different methods, the partial correlations between Union density and Vocational training are close to zero, and P-values are quite big, which indicates this relationship between these two factors is not statistically significant. However, before adding the normalized racial diversity in the model, the coefficient of union density is already not significant, so racial diversity cannot explain the change of this negative relationship very well.
#Create a new dataset that contains only the data in the year of 2010 (newdata2)
##
## Call:
## lm(formula = Voe ~ Union + Urbanity + Unemployment + GDP_Per_Capita +
## Totalpop, data = newdata2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -106386 -54424 -13568 33851 237874
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.088e+05 1.282e+05 1.629 0.1135
## Union -6.044e+03 2.940e+03 -2.056 0.0483 *
## Urbanity -1.824e+03 1.426e+03 -1.279 0.2103
## Unemployment -1.392e+04 9.042e+03 -1.540 0.1338
## GDP_Per_Capita 9.359e-01 2.052e+00 0.456 0.6515
## Totalpop 1.846e-02 3.265e-03 5.653 3.3e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 78000 on 31 degrees of freedom
## Multiple R-squared: 0.5737, Adjusted R-squared: 0.5049
## F-statistic: 8.343 on 5 and 31 DF, p-value: 4.365e-05
#transformation
##
## Call:
## lm(formula = log(Voe) ~ Union + Urbanity + Unemployment + log(GDP_Per_Capita) +
## log(Totalpop), data = newdata2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.1238 -0.6101 -0.1354 0.2171 1.9219
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.65379 13.76133 0.193 0.848339
## Union -0.02057 0.03218 -0.639 0.527373
## Urbanity 0.01019 0.01839 0.554 0.583630
## Unemployment -0.11309 0.11422 -0.990 0.329785
## log(GDP_Per_Capita) -0.67745 1.23285 -0.549 0.586604
## log(Totalpop) 1.00888 0.25223 4.000 0.000365 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8618 on 31 degrees of freedom
## Multiple R-squared: 0.6182, Adjusted R-squared: 0.5566
## F-statistic: 10.04 on 5 and 31 DF, p-value: 8.748e-06
##
## Call:
## lm(formula = log(Voe) ~ Union + Urbanity + Unemployment + log(GDP_Per_Capita) +
## log(Totalpop) + norm_Racial_diversity, data = newdata2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.0568 -0.6616 -0.1787 0.2531 1.9217
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.52911 15.72551 -0.224 0.823952
## Union -0.03351 0.03594 -0.932 0.358573
## Urbanity 0.01069 0.01849 0.578 0.567653
## Unemployment -0.07483 0.12378 -0.605 0.550046
## log(GDP_Per_Capita) -0.17485 1.38037 -0.127 0.900049
## log(Totalpop) 1.07513 0.26590 4.043 0.000339 ***
## norm_Racial_diversity -0.81309 0.98377 -0.827 0.415040
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8663 on 30 degrees of freedom
## Multiple R-squared: 0.6267, Adjusted R-squared: 0.552
## F-statistic: 8.393 on 6 and 30 DF, p-value: 2.193e-05
The data set, which is used to build all the years model, only includes the data from 2009 to 2017. Since in the raw data set, there is some missing value from 2003 to 2008. And due to the new data only have one measurement for urbanity, so this model is assuming that for all the years, urbanity is the same.
##
## Call:
## lm(formula = log(Voe) ~ Union + Urbanity + Unemployment + log(GDP_Per_Capita) +
## log(Totalpop) + factor(Year), data = newdata3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.5245 -0.5615 -0.1621 0.5521 2.1095
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.156130 4.211933 -0.037 0.97046
## Union -0.035833 0.010926 -3.280 0.00117 **
## Urbanity 0.006495 0.006699 0.970 0.33305
## Unemployment -0.172147 0.039589 -4.348 1.91e-05 ***
## log(GDP_Per_Capita) -0.397262 0.360667 -1.101 0.27163
## log(Totalpop) 1.053662 0.080729 13.052 < 2e-16 ***
## factor(Year)2010 0.020848 0.180272 0.116 0.90801
## factor(Year)2011 -0.286825 0.187695 -1.528 0.12759
## factor(Year)2012 -0.242713 0.199784 -1.215 0.22542
## factor(Year)2013 -0.124493 0.196571 -0.633 0.52703
## factor(Year)2014 -0.492981 0.213243 -2.312 0.02150 *
## factor(Year)2015 -0.452687 0.221018 -2.048 0.04146 *
## factor(Year)2016 -0.585156 0.246768 -2.371 0.01839 *
## factor(Year)2017 -0.653285 0.262517 -2.489 0.01340 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.821 on 284 degrees of freedom
## Multiple R-squared: 0.608, Adjusted R-squared: 0.59
## F-statistic: 33.88 on 13 and 284 DF, p-value: < 2.2e-16
Compared the all the year model and one year model, the coefficient of union density does not change a lot, -0.02 means every 1 unit increases on Union density, the Voe expect to decrease 2%. The coefficient of 2009 is the intercept, and the c of 2010 indicates the Voe in 2012 is 0.058 unit higher than the Voe coefficients in 2009; the coefficient of 2011 means the Voe in 2012 is 0.18 unit lower than the Voe on 2009; and so on.
By checking the QQ plot, there is one outlier in the lower left, and it was the data for Louisiana in 2010. However, it is unreasonable to remove one year data of one state, so this outlier still exists in the following model.
fit5 = lm(log(Voe) ~ Union+Urbanity+Unemployment+log(GDP_Per_Capita)+log(Totalpop)+ norm_Racial_diversity+factor(Year),data = newdata3)
summary(fit5)
##
## Call:
## lm(formula = log(Voe) ~ Union + Urbanity + Unemployment + log(GDP_Per_Capita) +
## log(Totalpop) + norm_Racial_diversity + factor(Year), data = newdata3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.5110 -0.5791 -0.1541 0.5455 2.0881
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.621067 4.343951 -0.143 0.886413
## Union -0.037625 0.011651 -3.229 0.001387 **
## Urbanity 0.007283 0.006935 1.050 0.294545
## Unemployment -0.165275 0.042514 -3.888 0.000126 ***
## log(GDP_Per_Capita) -0.362325 0.369517 -0.981 0.327660
## log(Totalpop) 1.058677 0.081616 12.971 < 2e-16 ***
## norm_Racial_diversity -0.137486 0.307229 -0.448 0.654854
## factor(Year)2010 0.016554 0.180781 0.092 0.927107
## factor(Year)2011 -0.285283 0.187991 -1.518 0.130248
## factor(Year)2012 -0.237300 0.200431 -1.184 0.237426
## factor(Year)2013 -0.123375 0.196864 -0.627 0.531360
## factor(Year)2014 -0.481869 0.214982 -2.241 0.025773 *
## factor(Year)2015 -0.434703 0.224949 -1.932 0.054301 .
## factor(Year)2016 -0.561973 0.252488 -2.226 0.026820 *
## factor(Year)2017 -0.626144 0.269793 -2.321 0.021006 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8222 on 283 degrees of freedom
## Multiple R-squared: 0.6082, Adjusted R-squared: 0.5889
## F-statistic: 31.38 on 14 and 283 DF, p-value: < 2.2e-16
par(mfrow=c(2,2))
plot(fit5)
After adding the normalized racial diversity in the all year model, the coefficient of union density only decreases 0.02, and the four plots of the model are similar to the previous model. Therefore, it is hard to say that racial diversity can explain the negative relationship between union density and Voe.